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1. Introduction 

Although parametric models are prone to misspecification, they still be attrac- 
tive because they describe concisely the link between the past observations and 
the predicted variable. Vari ous parametric models have been proposed these 
last decades. For a review, see Brockwell and Davis] (Il991 ) , Brockwell and Davisl 



(|l996MShumwav and Stofferl (|200ll ). and lTongl (|l99(l ). Parameter estimation for 



linear models has been widely studied, while for nonlinear models, because of 
their complexity, the study is done in general for tractable cases. There is an 
increasing interest in estimating the parameters of ARCH and GARCH models 
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introduced respectively bv lEngld (jl982h and IBollerslevI (|l986l ). Most of the ex- 
isting literature assume a Gaussian error distribution and study the consistency 
and asymptotic normalit y of th e cond it ional Gaussi an likelihood estimators. 
Some relevant papers are lEndd (Il982h . IWeisd (Il986l) for ARCH mod e ls, an d 
for ARCH(oo) and/or G A RCH models. IBollerslevI (Il986l). lLu~msdaind (fl99fil) 
iHvndman and Yaol (l2002h. I Franca and Zakoianl (l2004h. iRobinson and Zaffroni 
(2006). ! Straumann and Mikoschl (|2006T ). and lFrancq and Zakoianl (|2007[) . Other 
papers d ealing with parameter e s timat ion in heteroscedastic models include the 
works of Giraitis and RobinsonI (l200ll) who propo s e a W ittle estimation for a 
class of parametric ARCH(oo). IChatterjee and Dad ( 2003 ) who st udy estimators 
obtain ed by minimizing certain functionals for ARCH models, iPeng and Yao 
(2003) who p ropose least absolute d eviati ons estimators for ARCH and GARCH 
models, and iBerkes and Horvath (|2004l ) who study likelihood estimators for 
GARCH models. 

In the present paper, we study parameter estimation for more general het- 
eroscedastic models. Precisely, we consider the class of identifiable parametric 
stochastic models 



Xi = m (p; + er(6»; Zj_i)£j, i E Z, 



(1.1) 



where (Xi) ie % is stationary and ergodic; (Z l = (Xi, . . . , AV g +i, -X"i-q))iez is a 
sequence of g-dimensional vector with q being a nonnegative possibly infinite 
integer; (ei)igz is a sequence of iid centered random variables with unit variance 
such that Si is independent of a(Zj,j < i); the parameter column vector ip — 
(p',9')' belongs to>P = 0x0cl'x K , for some positive integers / and 
J, and the functions m (p; z) and a(9; z) have known forms. We aim to prove 
the existence of asymptotical normal estimators for the true parameter vector 
i/jq = (po,#q)', and uniformly consistent estimators for the noise's density and 
its derivatives, when this function exists. 

The class of models flTTTj) contains models such as ARMA, EXPAR, ARCH, 
GARCH, SETAR-ARCH, /3-ARCH and many others. As far as the probabilist 
properties of these models are concerned, their invertibility i s readily obtained 
for ex ample for > 0. For some of them (see, e.g., iNgatchou- Wandi il 

( 20051 )). a sufficient condition for strict stati onarity can be obtained e .g., b y 
checking the conditions (S1)-(S4) of p. 86 in iTaniguchi and Kakizawal (2000). 
The c ase of GARCH mod els which generalizes ARCH models has been stud- 
ied bv lChen and~Anl(|l998h . Next, a suff icient condi t ion fo r geometry ergodicity 
can be obtained by applying a result of iTjostheim ( 1990h . while for a particu- 
lar cjBSS_of_ J ARCF£jnodels jiested in (jl.ip . this property has been investigated 
by lAn. Chen and Huanel ( 19971 ). Finally, it is possible that from the theory of 
Markov chains, other interesting conditions for stationarity and ergodicity be 
obtained for many models within (jl.lj) . 

Under mild conditions , a con ditional least-squares estimator of po is defined 
in McKeague and Zhang ( 1994 ). Its co nsistency and asymptoti c normality is 
established. The same is done for 9q in INgatchou- Wandi il (|2002l ). S uch results 
have also been established for multivariate nonlinear AR models by |Ti0stheiml 
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(1986). Our main contribution is the study of the estimation of the couple 
of parameters tpo — (p' Q ,6' Y in model Ijl.ip by conditional least-squares and 
conditional maximum likelihood methods, when the conditional distribution is 
non necessarily normal and q possibly infinite. Our results generalize most of 
those based on least-squares and pseudo or quasi-likelihood estimation. 

After the assumptions given in Section [21 we prove in Section [3] the existence 
of a sequence of asymptotical normal conditional least-squares estimators for ipQ. 
Section Q] deals with the existence of conditional likelihood estimators for this 
parameter. In Section [5l we give some common examples comprised in (jl.lj) . In 
Section[6j the estimation of the noise's density and its derivatives is investigated. 
A simulation study done in Section [7] ends our work. 



2. General assumptions 



In the whole text, the transpose of a vector or a matrix function Tt(x) is denoted 
by H'(x). Let r be either / or J. For given real functions T(a; z) defined on 
a non-empty subset of W x R 9 and K.(ip; z) defined on a non-empty subset of 
l f xl J xR«, we denote 

**<*«>-(*r *&f.***-i&* w 

*«**»> -(^' 1^4 

For a vector or matrix function TL{x), we denote by d'H(x) the transpose of 
dH(x). With this, we define dlC(ip; z) = (d' p K.(ip; z);d' e K{ijj\ z))' . We also define 

For a real- valued function h, denotes its pth order derivative, with = h. 
We denote by ||V||g the Euclidean norm of the vector V and by ||M||x = 
maxjj | My | the norm of the square matrix M=(My). 

We next assume that the true parameter vector ip Q = (p' , 9' Q )' of is such 
that po € int(O) and 8q £ int(O), where int(O) and int(8) denote respectively 
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the nonempty interior of and 0. We also suppose that all the random variables 
in this paper are defined on the same probability space (0,W,P), where f2 is 
a set, W a er-ficld of and P a probability measure on W. The following 
assumptions are needed: 

(Ai) The common fourth order moment of the £,'s is finite. 

(A2) The functions m(p; z) and a(9; z) are each twice continuously differentiablc 



(.4.3) There exists a positive function j3{z) such that E[(3 4 (Zo)] < 00 and for all 
Px,p2 S and 0i,6 2 € 0, 



Assumption (Ai) is at least satisfied by Gaussian and Student e^'s. One can 
find in the literature, numbers of models with the functions m(p; z) and cr(9; z) 
satisfying (A 2 ) and (^3) (see, e.g.. iNgatchou-Wandjil (|2005l )). 

3. Conditional least-squares estimation 

The purpose of this section is the study of the existence of estimators for ipo = 
(pg, 9' )' by a conditional least-squares method. Recall that the conditional mean 
and the conditional variance functions of (11. ip are almost surely defined for 
all z e W by E{X 1 \ Z = z) = m(p; z) and - m(p;Z )} 2 \ Z = 

z} = a 2 (9;z). From these equalities, for any bounded measurable functions 
7(2) and A(z), we have E^Xt -m(p; Z ))X{Z )] = and S[{(Xi-m(p; Z )) 2 - 
(j 2 (9; Zq)}j(Zo)] = 0. For estimating -0o, our idea is to search for the zeros 
of the gradients of the sample variances of the sequences of centered random 
variables (X l - m(p; Zj-i)), i = l,...,n and ([X, -m(p; Z.^1)] 2 - er 2 (6>; 
i = 1, . . . , n. 

Given X^ q , . . . , X , X x , . . . , X n , denote X n = (X n , . . . , Xi, X , 
. . . , X- q ) and define the sequences of random functions 




see see eee 



sup \a(9; z)\, sup \\da{9; z)\\ £ , sup \\d 2 a{9; z)\\ M } < a(z), 



max{|m(pi;z) - m(p 2 - 1 z)\, \\dm(pi\ z) - dm(p 2 ; z)\\ £ , 
\\d 2 m(pi,z) - d 2 m(p 2 ;z)\\ M , \a(9i;z) - a(9 2 ;z)\, 
\\da(9 i; z) - da(9 2 ; z)\\ e , \\d 2 a(9 1 ; z) - d 2 a(9 2 ; z)\\ M } 
<P(z)min{\\p 1 -p 2 \\ £ ,\\e 1 -e 2 \\ £ }. 



n 




(3.1) 




(=i 
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and the matrices 

$11 - 2E[X 2 (Z )a(9 Q] Z )dm(p ;Zo)d'm(p ;Z )} 
$22 = 8E[j 2 (Z Q )<j 2 (9 Q ;Z Q )da(9o;Z Q )d'<j(9 Q ;Z Q )}, 

assumed to be positive definite. Define also the matrices 

A n = AE[X i (Z Q )a 2 (9 ; Z )($ n 1 )'9m( j0o ; Z )d'm(p ; Z„)*u] 
Ai 2 = A' 21 

= 8E[X 2 (Z Q ) 1 2 (Z a )a i (9 a ; Z Q ){^)' dm{p Q ; Z )d'a(9 ; Za)*£]E[e (4 - 1)] 
A 22 = 16E[ 1 \Z )aH9o;Zo)(^ 2 1 yda(9 ;Z a )d'a(9 Q ;Z Q )^ 2 1 }E[(e 2 - l) 2 ], 

and 

An Ai 2 



A 



Aai A 



22 



Theorem 3.1. Assume that the assumptions (*4i)-(„43) hold and A is positive 
definite. Then, 

(i) there exists a sequence of estimators ip n = (p'm9' n )' such that ip n tpo, 
and for any e > 0, there exists an event Si with P(S\) > 1 — e, and a 
nonnegative integer n\ such that on S\, for n > n\, 

• dU n (p n ;~X. n ) — and [/„(p;X„) attains a relative minimum at p — p n 

• • assuming p n fixed, dgS n {{p n , 9 n ); X„) = and S n ((p n ,9);X n ) attains 
a relative minimum at 9 = 9 n . 

(ii) n l /2$ n -^ )^jsf(0,A). 

Proof. It s uffices to check the hy potheses of Theorem 3.2.23 of Taniguchi and 
Kakizawa (|2000h . established by iKlimko and Nelsonl (|l978h by using Egorov 
Theorem (see, e.g.. iTaniguchi and Kakizawal (|200(J ). p. 97). From simple com- 
putations one obtains: 

n 

dU n (p;X n ) = -2Y / ^ 2 (Z l -i)dm(p;Z^ 1 ) [X t - m{p; Z<_i)] 

i=l 

and 

d 2 U n {p;X n ) 

n 

= 2 \ 2 {Z l - l )(dm{p- Zi^)d'm(p; Z 4 _i) - d 2 m{p; Z^ x ) [X t - m(p; ) . 

By ergodicity, it is immediate that, as n tends to infinity, 

-dU n ( Po ; X„) ^ and -d 2 U n ( Po ; X n ) ^ $ n . 
n n 

For any vector G int(0) define the sequence of random matrix functions 
K„(p,;Xn) = d 2 U„{p*; X„) - d 2 f/ n (p ; X„), 
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and denote by V n (p*; X n )«. its (£,k)th entry. Then, 

'dm(p^Z i ^ 1 ) dm(p lt ;Z i _ 1 ) dm(p ; dm(p ; Z l _ 1 ) 



= 2^A 2 (Z l _ 1 ){(- , 

^ dpi dp k dpe dp k 

- (d 2 m(p. t ;Z t -i) [Xi - m(p*;Z l -i)] - d 2 m(p ; Zi-i) [X t - m(p ; Zi-i)] ^ j 

Thus, in view of („4i)-(,43), it is easy to see that there exists a positive real- 
valued function vek(z) with E{vf k (Zo)] < oo such that 

n 

\V n (p*;'X.n)ik\ < ||p* - Polls /^ik(Zi)- 

i=l 

Now for 5 > such that ||p — po||f < S, and for lying between p and po, we 
have by the above inequality that: 

1 1 " 1 - 

^|T4(p*;X„) tt | < -^\\p* - po| |f 2J ^fe(^-i) < - ^fe(^-i). 

i=l 4 = 1 

Next, by ergodicity, the right-hand side of the last inequality converges a.s. to 
E[vik(Zo)] < 00 as n tends to infinity. It is then clear that for any ((, k), 

lim sup — |V^(p*;X„) tt | < oo. (3.3) 
n— >oo <5_,.o no 

From Theorem 3.2.23 of Taniguchi and Kakizawa ( 2000f ). it follows that there 
exists a sequence of estimators p n such that p n — > po almost surely, as n — ► oo 
and for e > 0, one can find an event Ei with P(Ei) > 1 — e and a nonnegative 
integer h such that on E\, for n > h, dU n (p n ;'X.n) — and C/„(p;X„) attains 
a relative minimum at p = p n . The first part of (i) is then handled. For the 
second part, for fixed p n , we have from simple computations: 

deS n ((pn, #o); X„) 

n 

= - 4^7 2 (^-iM0o;^-i)^(0o;^-i) 

x {[X t - m{p n - Z^)} 2 -^{O^Zi)} 

n 

= ~ 4^7 2 (^-i)^(^o;^-i)^(^o;^-i) 

i=l 

x|o- 2 (6'o;Z l _i)(e2 _ l) + 2a(6 ; Zi-i^^po; - m(p n ; Z<_i)] 
+ [rn(p ; Zj) - m(p„; Zi_i)] 2 | (3.4) 
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and 

dg2S n ((p n , 9o); X n ) 

n 

= 8 l 2 (Z l - 1 )a 2 (9 ; Z i . 1 )da(9 ; Z i ^ 1 )d'a{9 0] 

n 

- 4^ 7 2 (^-i){ (^0; Z i - 1 )d , a(9 Q ; + cr(0 o ; Z i ^ 1 )d 2 a(9 ; Z^ x ) 
x ([X, - m(p„; Z,-!)] 2 - ct 2 (0 o ; Zi_ x )) } 

n 

= 8 ^ 7 2 (Z 4 _ 1 )a(0 o ; Z i - 1 )da(9 ; Z l ^ 1 )d'a{9 ; ^-1) 
i=l 

n 

- 4^ 7 2 (Z,_ 1 )(a ( r(0 o ; Z^ x )& a{Q Q \ Z t -{) + a(9 ; Z^d 2 ^; Z^ x ) 
x \a 2 (6 - Zi_x)(E 2 - 1) + 2cr(6» ; Z^i^mOo; - m(p„; ^_i)] 



+ [m(p ; ^i-i) - m(p n ; Z 4 _i)] 2 j. 



(3.5) 



In view of (Ai)-{Az) , applying the mean value theorem to (|3.4p and (|3 . 5[) . it is 
clear by ergodicity that as n tends to infinity 

-deS n ((p n ,6 );X n ) ^ and -dg 2 S n ((p n , 9 ); X n ) $ 22 . 
n n 

For any vector 0* € int(O) define the sequence of random functions 

T n (9*; X n ) = dg2S n (9*; X n ) — d 2 2S n (9o; X„), 
and denote by T„((9»; X„)£fc its (£, fc)t/i entry. 
T n (9*; 'X. n )ek 

= 8f; 7 a (^-i){^;^-o ftr(fl * ;Zi - l)ftr(fl * ; ^- l) 

1=1 

- o-(0 o ; Zi-i] 



4 E7 2 (^-i){[( 



'da(9o;Z i _ 1 )da(9 ;Z i _ 1 ) 



d 2 o(9^Z l _ 1 ) 



{ d9 e W k +a W> Z *-i> dded0k 

([X, - m(p n ; Z^)} 2 - <7 2 (9 Q ; Z^fj] }. 
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In view of (Ai)-(A3), it is easy to see that there exists a positive real- valued 
function gik(z) with E[gj k (Zo)] < oo such that 

71 

\T n (9*; X n )^fc| < — 0o\\e / ] QtkjZj-i)- 

Again, for 6 > such that \\9 — 0q\\s < S, and for 6*» lying between 9 and #0) we 
have from above that: 

1 1 " 1 " 

— ?|r n (^ ! | t ; X„W| < — 110* — 9o\\s / Qek(Zi-i) < — > 



It is easy to see that by the ergodic theorem, the right-hand side of the last 
inequality converges almost surely to E[gik(Zo)] < oo as n tends to infinity. It 
is then clear that f or any (£, k). (13.31) holds wi th T n ( 9*;X. n )?k- Whence, applying 
Theorem 3.2.23 of Taniguchi and Kakizawal ( 2000f ). one can find a sequence of 
estimators 9 n such that 9 n — > 9 almost surely, as n — > oo and for e > 0, one 
can find an event E 2 with P{E 2 ) > 1 — e and a nonnegative integer h such 
that on Ei n E 2 , for n > h, dgS n ('ip n ;X. n ) = and S n ((p n , 9); X„) attains a 
relative minimum at 9 = 9 n . It is an easy matter to see that for all e > 0, 
P(E\ n E 2 ) > 1 — e. Thus taking S\ = E\ fl E 2 and m = max(n, h) yields the 
first part of Theorem 13.11 To handle the second point we observe that 

1 2 - 
—^=dU n (p Q ; X„) = j= S_\ X 2 (Zi^i)dm(pa; Zi-i)a(9 Q ; Z l -i)e il 

v n v n i=l 

and by a Taylor expansion of order one of the function dU„(p] X n ) around po, 
for larger values of n, one can write 

2 ™ 

VE(p n -p ) = -= y"\ 2 {Z^ 1 )a{9 -Z l )e l d'm{p -Z l _ 1 )^ +o P (l). 

One can also observe that 

1 4 n 

-=d e S n {{p n , Oo); X„) = --= y ? 2 (Z^ 1 )<j 3 (9 ; Z l ^)da{9^ Z i _ 1 )(e l 2 -l)+o P (l) 

v v l—l 

and write for larger values of n, 

A 71 

VTl(9 n -9 Q ) = -= y2 1 2 (Z^ 1 )a 3 (9 ;Z^ 1 )(e 2 l - l)d'a(9 a ; Z^)^ + o P (l). 



Then putting in Theorem 1 of Ngatchou-Wandiil (|2005f ): = £j, = Zi-i, 
r x (x) = x, T 2 (x) =x 2 -l, IT (2) = 2X 2 (z)a(9 ;z) d'm(p ; z)^, andll 2 (z) = 
4'y 2 (z)a 3 (9 ; z) d'<r(9 ; z)§ 22 , it results that 



□ 
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Corollary 3.1. Assume that the assumptions of Theorem \3.1\ hold and that 
i?[eo(£o — 1)] = 0- Then p n and 9 n are asymptotically uncorreleted. 

Remark 3.1. The condition E[eo(sj ) — -0] = ^( £ a) = m the above corollary 
holds for symmetric densities. If it does not hold, p n and 9 n will not be indepen- 
dent in general. This fact is ignored when the estimation is done for example 
with Gaussian or Student Si 's. 

Remark 3.2. One could also prove the existence of consistent conditional 
type estimators for tpo by using directly the random function S n (jp]~X. n ). It is 
clear that this would have provided a one step estimator. However, one may 
not retrieve the classical least-squares estimators. For example, in the very 
simple case of m(p,z) = pz and a(9; z) = 1, it is very difficult to have a 
simple expression for p n by minimizing S„((p,l);X„), whereas, the preceding 
two-steps method yi elds th e tradi tional least-squares estimator of po. Another 
approach (see, e.g., \HeydA $199$) ) consists in minimizing the sum of square 
^" =1 {[Xi — m(p; Zj_i)]/cr(0; Z,;_i)} 2 . Yet, for the special case of a (9; z) = 6 ^ 
and p € K, it is not clear how to estimate 9q when po ^= 9q. 

Remark 3.3. The choice of the functions j(z) and X(z) is an open problem 
and we will not try to tackle it here. In the simulation, they are taken constant. 
Although this choice may be sub-optimal, it matches with what is done in the 
literature. 



4. Conditional likelihood estimation 

For models such as (jl.lj) . the density function of the noise can be useful for 
writing the likelihood and/or conditional likelihood functions. In practice, for 
choosing this density function, (11. 1| can first be fitted by least-squares methods. 
Next, various tests can be applied to the residuals from the fitted model to help 
postulating an adequate density function / (not necessarily Gaussian) for the 
noise. However, because it facilitates parameters estimation, pseudo-likelihood 
estimation method is very popular in practice. This probably explains the huge 
literature on the subject (see references given in Section [T]). 

In this section, we study the conditional likelihood estimation of the param- 
eters when the noise has a non necessarily Gauss ian density function /. This 
work has been done bv lBerkes and Horvatb ( 20041 ) in the case of GARCH mod- 



els. For simplicity, we restrict our study to models (|1.1|) for which the function 
a(9; z) satisfies: 

(B ) For all (9, z) € R J x W, a(9; z) > n, for some constant n > 0. 

Under (Bo), the log-likelihood of a given sample X„ = (X n , . . . ,X\, Xq, 
X—i, . . . , X- q ) conditional to Zo is 

n 

L„(V; X„) - ]T {- log[<r(0; + log [/ (e^))]} , (4-1) 

i=l 
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where we recall that -0 = {p' ■,&')' € ^ and for all i G Z, £j(0) = [X* — 
m(p; Zj_i)]/o"(0; Zj-i). For the derivation of the results of this section, we make 
the following assumptions on the density function /: 

(£>i) f{x) > for all xel, and / is twice differentiable. 
(2? 2 ) 4>f = — f^'/f is differentiable with continuous derivative. 

Next, for all i £ Z, we define on R 1 x R J , the following random functions: 

6(V0 = ^W)) 
C*(0) = £ ^)0/MVO) 

6(0) = eiWffi&ty)) 

6 WO = CiW + eiWCiW- 

We also need the following additional requirements: 

(S3) There exist a positive function v(z) such that -E[t> 4 (Zo)] < 00 and for all 
i € Z and ipi,ip2 & a.s., 

max{|&(Vi) - l6(V>i) - 6(V> 2 )|, lCi(V-i) " aV>2)| 

l6W>l) " 6(^2)1, |6W>l) - 6(02)|} < " MS- 

(B4) There exists a positive function t(z), such that E[t 4 (Zq)} < 00 and for all 
ieZ, a.s. 

max{sup |^(0)|, sup |Ci(0)|, sup |Ci(0)|, sup |6C0)I> sup |6(0)|} < ^(Z,). 

i/iG* i/ig* i/iS* VG* VS* 

Such assumptions have been done in Ngatchou-Wandjil ( 2005j ). They are at least 
satisfied by linear autoregressive models, EXPAR and TAR models, ARCH and 
more generally /3-ARCH models, with Gaussian /. 

Define the matrices 



En = E[<j- 2 (e ;Z )dm(p ;Z )d'm(p ;Z )} J <f>y> (x)f(x)dx 

E12 = E 21 =E[a- 2 (e ;Zo)dm(p ;Z )d'a(9 ;Z )} J x<f>){x)f{x)dx 

E 22 = B[£T- a (ff ;2b)flff(po;^b)S / o-(»o;^o)] / arfa/te) + x<j>f\x))f(x)dx 

A n = S[a- 2 (0 o ;Z o )9m(p o ;Z o )a'm(p o ;^o)] / 4>){x)f{x)dx 

A12 = A 21 =^[ ( t- 2 (0 o ; Z )da(e ; Z )d'm{p : Z )] J 4> f {x){x^ f {x) - l)f{x)dx 

A22 = S[a- 2 (0 o ;Z o )aa(po;^o)aV(0 o ;^o)] [ {x^ f {x) - if f{x)dx, 
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E-O 1 and A=f V 

\ ^21 ^22 / \ A21 A22 / 

Theorem 4.1. Assume that (Ai)-(As) and {Bo)-{Bi) hold, and that the matrix 
E is negative definite. If J <f>f(x)f(x)dx = and J x<j>f(x)f(x)dx = 1, then 

(i) there exists a sequence of estimators = {p' n ,0' n )' such that tp n ipQ, 
and for any e > 0, there exists an event S2 with P(S2) > I — e, and a 
nonnegative integer 71 2 such that on S2, for n > n2, dL n (ip n ; X„) = and 
L n ( , 0;X n ) attains a relative maximum at tp = i/) n . 

Proof. The tools for the proof are exactly the same as those of the proof of 
Theorem O Define Q n {ip;X n ) = -L n (>;X„). Then 

dQ n (tp ;X n ) 

i=l 

n 

dl 2 Q n (iP;X n ) = -5^o-- 1 (ffo;^i-i)(fl 2 mOJo;Z i _i)e < (^ ) 

i=l 

-o- _1 (6* ; Zi-i)dm(po', Z l ^ 1 )d'm(p ; Zi_i)&(V>o)) , 

n 

d 2 6oP Qn(%; X„) = o- 2 {0 o ; ^-i)fe(^o) + (M}dm(p ; Z^ x )d'*(9 ; Z^ x ) 

»=1 
n 

d 2 pe Q n (^;X n ) = °~ 2 (Qo; Zi-x){^) + Ci(^o)}da(e ; Z^ x )& 'm(p ; Z^ x \ 
i=i 

dg2Qn(lpo;X n ) 

n 

= [v'HOo; Zi-Ojtr- 1 ^; Zi- X )da{9 Q] Zi_i)0V(0 o ; Z^ x ) 

- d 2 a(9 Q ; Zi_i}(Ci(Vto) - 1) +^ 2 (#o; Zi_i)fl<r(fl ; Z^ x )d'a(e ; Z^ X )U^) ■ 
It is easy to see that, as n tends to infinity, 

-9Q„(V> ;X„) and -d 2 Q n (ip ;X n ) -E = E. 
71 ra 

It is clear that the matrix E is positive definite. For any vector ip* 6 int(5 f ), 
define the sequence of random functions 

T n {^)^ X„) = 9 2 Q ra (V>*;X n ) - <9 2 Q n (V>o;X„), 
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and denote by T n (ip*; X„),ft its (£,k)th entry. Any entry of d 2 Q n (ipo;~X. n ) is 
either a constant times the sum over i — l,...,n of the product of the com- 
ponents or entries of dm(po; Zj-i), d 2 m(po; Zi_i), do{6$; d 2 a(9o; Zi-i) 
and the random functions a(9 ; i^-i), £i(i>o), 6(^0), &(V>o), CiW>o), Ci(V>o) and 
0(^0)1 or sums or differences of such terms. We have for example: 

3 2 Q„0Ao;X„) 12 = - f" a-\6 ; Z^) ( ^^f m) feftfo) 
^ V ap 1 dp 2 

-i/n 7 \ dm (po; Zi-i) dm{p ] Zi-t) r 
dpi dp 2 

In view of the assumptions (Ai)-{A3) and (Bo)-(B4), we can deduce from the 
above example that for each (£, fc), there exists a positive real- valued function 
Pik with E[pj k (Zo)] < 00 such that 

n 

|7^(V>,;X„)«fc| < \\i>* - V'ollf y^^fc(^-i)- 

Then, for <5 > such that — ipoWe < 8, (n5)~ 1 \T n (i(j*;'X n )£k\ is bounded 
from the right by n _1 X)j=i HikiZi) which, by the ergodic theorem, converges 
almost surely to E[p£k(Zo)] < 00 as n tends to infinity. One can thus conclude 
that for all (£, k), (|3.3[) holds with T n (ip^; X„)ft. Here also, as in the proof of 
Theorem l3.ll there exists a sequence of estimators ip n = (Pn^'nY such that, a.s., 
ijj n — > ipo, and for any e > 0, there exists an event £2 with P(S 2 ) > 1 — e, and 
an integer n 2 such that on 6>2, for n > n 2 , dQ n (ipn',~X- n ) = and Q n (V>;X„) 
attains a relative minimum at ip = ip n . Since a relative minimum for Q n (V>; X„) 
is a relative maximum for L n (ip; X„), the first part of our result is established. 
For the second part, it remains to prove that n~ 1 / 2 dQ n (ipo;X n ) converges in 
distribution to a Gaussian random vector with mean and covariance ma trix A. 
This result is handled if one puts in Theorem 1 of lNgatchou-Wandil (|2005t ) : Ui — 
£i(V>)j Yi = Zi-i, IIi(z) = a~ 1 (0 o ;z)dm(po;z); U 2 (z) = <j~ 2 (9 ; z)da(9 ; z); 
Ti(x) = <pf{x); T 2 (x) = xcj)f(x) — 1. Finally , apply ing again the second part of 
Theorem 3.2.23 of Taniguchi and Kakizawal |2000) one has that 



V^(i>n - ifo) A/"(0, ST 1 AST 1 ). 

□ 



Corollary 4.1. Assume that the assumptions of Theorem \4-l\ hold, and that the 
equalities J <fyp (x)f(x)dx=f <fyi(x)f(x)dx ) J x(f> 2 (x)f(x)dx=J <pf(x)(x(j>f(x) — 
l)f(x)dx and J x(<ftf(x) + x<$\x))f(x)dx =J(x(f>f(x) — \) 2 f{x)dx hold. Then 

Vn(i>n - ipo) —> A/"(0, XT 1 ), 
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The conditions on the integrals in the above Theorem l4.1l and Corollary [4J] are 
verified at least by Gaussian density functions /. When those in Corollary 0TT] are 
satisfied, the Fisher information matrix converges to £. Hence, the Cramer-Rao 
bound is asymptotically achieved and ip n is asymptotically efficient. 



5. Some examples 



Here we list some common examples that are comprised in . It is not difficult 
to see that the AR(g) models of finite order q, either linear or nonlinear arc 
within p.ip for a(8; z) = Cst. T he usual ones are for example AR, SETAR, 
TARCH, EXPAR (see lTonel (jl990h l. Taking m(p; z) = in (fill]) yields ARCH(^) 
models. For finite q, the most popular one is the ARCH (q) model obtained with 

<r(0; = yfe + 1 X^_ 1 + ... + O q Xf_ q , O > 0, > 0, * = 1, . . . , g. (5.1) 

For q = oo, many other common models are within (jl.ip . It is the case for 
invertible ARMA models. In the particular case of MA(1) model defined by 

X^Si + ee^!, \6\<1, 

one has £j = ^^> (— O^X^j from which it results that 

X i =Y^{-0) j X i - j - 1 +e i . 

j>0 

As can be seen for instance in iPeng and Yacl d2003h . GARCH(p, q) models 
are also within (jl.ip . In the particular case of GARCH(1,1) model defined by 



Xi = hiEi, hf = c + aXf_ 1 + bhi_i, 



(5.2) 



it is proved that for a + b < 1, h t = 



1 -a 



6^a J 1 Xf_j, and consequently, 



X, 



1 - a 



3>l 



The class of models (jl.ip for q = oo also contains invertible bilinear models, 
such as the subdiagonal bilinear model defined by 

Xi = bXi-iEi-\ + Si. 

For this model, it follows from page 103 of iTaniguchi and Kakizawa (l20Q0h that 
if b 2 < 1, then 

j 

j>l k=l 
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which in turn yields 

\ j>l fc=l 

Remark 5.1. Although conditional least-squares and conditional maximum like- 
lihood estimators for ipo exists, their computation may need numerical methods, 
even for Gaussian density functions f. For example, for pure ARCH(l) models 
defined by 15. 1)) with q — 1 and Gaussian f , the conditional maximum likelihood 
estimators will be obtained by solving in ip = (O^O^)' the equations 



Xf 

T~ 







En ( Xf_ 1 Xfxf 1 t \ _ 
t=i V e +e 1 x-f_ 1 {e +e 1 x?_ 1 )2 J - u > 

n/ii/i £/ie restrictions 9a > cm<i < 0i < 1 . T/iis toH generally need a numerical 
method. A similar remark can be done for the GARCH(1,1) and bilinear models 
defined above when estimating by either the least-squares or likelihood methods. 



6. Kernel estimator for the noise's density and its derivatives 



In time series analysis, the conditional distribution can be very useful for the 
study of nonlinear phenomena such as the symmetry and the multimodality 
structure of a time series. In the setting of models (jl.ip . the conditional dis- 
tribution is the distribution of the noise. Nonpara metric estimation of condi- 
tional distribution has be en studied among oth ers by lHvnd man and Yaol (l2002h 
who use a kerne l method, Fan. Yao and Tone] (1996) who use the local polyno- 
mials approach, Fan and Yiml ( 20041 ) who use a cross-validation method, and 
Hvndman and Y ao (200^) who use a ker nel method and de rive a test for con- 
ditional symmetry from their estimator. Bai and NeJ ( 2001 ) also derive a test 
for conditional symmetry which rest on the kernel estimators of the conditional 
density and its derivatives. 

In this section, we assume that the e^'s have an unknown uniformly con- 
tinuous density function /, and we define its kernel estimator and those of its 
derivatives. We show the uniform consistency of these estimators. The results 
of this section can lead to the derivation of adaptative estimators for tpo, or to 
the construction of some goodness-of-fit tests for the function /. However, we 
will not study these problems here. 

For alii G Z and tp = (p',9')' € W c define the random function 



Xi - m(p; Zj-i) 
<r(e;Z,_i) 



(6.1) 



Let ip n = (p' n , 9' n Y be any consistent estimator of ipo such that n 1 / 2 (^„ — V>o) 
converges in distribution to a Gaussian distribution with mean and variance 
matrix T. Take for example the least-squares estimator of Section [3l or the 
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pseudo-maximum likelihood estimator which can be obtained from Section [4] 
with Gaussian /. Let p be a nonnegative integer and if be a kernel function 
differentiable up to order p + 1, with modulus of continuity ujk- Let (h n ) be 
a sequence of real numbers such that h n — ► 0, as n tends to infinity. For 
n = 1, 2, . . . , and for all define the random functions 



1 



=1 



K (p) 



(6.2) 



For observable £i(^>)'s, t he convergence o f the above Bhattacharya's estimators 
for f i - p \x) is studied in ISilvermanl (|l978h . Here, the £i(^>)'s are not observable 



and it is natural to estim ate f( p Hx) by fn\it>n\x). Follow ing Singhl ( 19791 ) . or 



the more recent paper of iHorova. Vieu and Zelinka (|2002h . other estimators of 
f( p \x) could be defined. The resul t s of t his section are established under the 
following assumptions of ISilverman (|l978l) : 



(Hi) K is uniformly continuous with bounded variation 
(H 2 ) J \K(x)\dx < oo and K{x) — > as |: 



oo 



{Hz) J K(x)dx = 1 

(Hi) I |xlog(|x|)| 1/2 d^(a;) < oo 



1, iftf)(a 



(H 5 ) / [log(l/u)] 1/2 dr(w) < oo, where t(u) = [uj k {u)] 1/2 
Jo 

(H e ) For j = 0, 



■P- 



0asN^ooand/|^)(^<oo 



(H7) The Fourier transform of K is not identically one in any neighborhood of 
0. 



In lSilvermanl (|1978l ). the assumptions (Hi)-(Hs) are needed for the convergence 
of f n (ipo'i x) to f(x), while (Hi), (H2), (H4), (Hb)-(H'j) allow for the convergence 
of fn P \ipo]x) to f( pS) (x), p > 1. These assumptions hold at least for Gaussian 
kernels. 

We have the following theorem: 

Theorem 6.1. Assume (Bo) and (He) hold, and the function K^ p+1 ^ is con- 
tinuous. Let r be any integer such that < r < p. Assume n x ^ 2 h^ 2 — ► oo 7 as 
n — * 00 . Then 

sup|/W(Vo;x)-/W(^;x)|=op(i). 



Proof. Let < r < p, r integer. By a Taylor expansion of order one, we have, 
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for some vector ip* lying between ipQ an d ipn- 
fi r) tya;x)-fP& n ;x) 



1 J2 IkV ( x ~ £i ^ \ _ K (r) ( X ~ £ M 

_i_ £ ( *-^> ) a' e ,(*.)Wo - M 



By the triangle inequality, it then follows that 
sup|/W(^ ;x)-/M(^;x)| 



1 ^ 

< sup sup — 75 y 

i/>e* x£K n/in z — ' 



= 1 



/in 



x sup Wffei^WsW^o -ipnWe 



< 



l 3 / 2 h 



1 - 



sup sup 

■06* £CGR 



^ (r+1) ( X-ZiW 



x sup \\d'ei{ip)\\s\\Vri(ipo - i>n)\\s- 

Since the function i^( r+1 ) is continuous, by (7^) it is bounded, and there exists 
a constant C > such that almost surely, 



sup sup 



(r+l) f ^-SiW 



< c. 



Also, under (Bo) and (Ai), one can find a positive function x( z ) with E[x 4 (Zq)] < 
oo such that for all 1 < i < ri, 

sup ||5' £i (V)||£<x(^-i). 
^i=(i/>i,i/>3)e* 



From these two inequalities, we have 



C 



sup|/M(^o;x)-/W(^;x)| < , , 
iei n A l z hn r 



Y^xiZi-x) \\\V^bPo-i>n)\\e. 



By our assumptions, we have that ||^/n(^>o — ^n)||f converges in distribution to 
| |JV"(0, r)||g. By the ergodic theorem, almost surely, 



1 - 

-J2x(Zi-i) ^ E[x(Z )}. 



The result then follows by the fact that n 1 / 2 h r r ^ r2 — ► oo, as n — > oo. □ 
An immediate c onsequence of both Theorem 14.11 and Theorems A and C of 



Silverman! (|1978l ). is the following corollary. 
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Corollary 6.1. Assume that (Bo) holds and the function K^ p+1 ^ is continuous. 

(i) Assume that (7ii)-(7i4) hold and that n 1 / 2 ^ — > oo and (nh n ) _1 log(n) 
— > as n — > oo. Then uniformly, in probability, f n (ip n 'iX) converges to 

(ii) For p > 0, assume that the function K^ p ' satisfies (Tti)-(Ti.2), (Ha) -(Hi), 
and n 1 / 2 ^^ 2 > oo, n _1 ft.^ 2p_1 log(l//i n ) — > asm oo. Then uni- 
formly, in probability, f„ (ip n ; x) converges to f P\x). 

Proof. By the triangle inequality, write, for r = or r = p, 

sup < sup|/W(^)-/W(Vo;x)| 

+ sup|/M(^ ;x)-/M(^;x)|, 

and apply Theorem 14.11 and Theorems A and C of Silverman ( 1978t ) . □ 

Remark 6.1. Take h n = Csi.n" 1 / 9 . Then, one has n x / 2 h 2 — ► oo and 
(n/i n ) _1 log(n) — > as n — > oo, which satisfies the hypotheses of the part 

(i) of the above corollary. For 1 < p < 2, one has ti 1 ' 2 /^" 2 ► oo and 

n~ 1 /i J ^ 2p ~ 1 log(l//i n ) — > as n — > oo and </ie requirements of the part (ii) 
is satisfied. 



7. Simulation study 

To illustrate some of our results, we conducted a simulation experiment that 
we present and comment in this last section. We restricted to models for which 
we could obtain explicit and simple expressions for the estimators. This avoided 
the use of numerical methods. We consider the following models : 

Xi = [ Pa + p x cxp(- K X 2 _ 1 )]X J _ 1 + y fe +e 1 Xf_ 1 E i , (7.1) 

where the parameters po,p\,K > 0,9q > and 9\ > eventually satisfy some 
conditions insuring the existence, the invertibility, the stationarity and the er- 
godicity of (Xi) ie z- For example, for model (ii) below, (|7.1[) admits a strictly 
stationary and geometrically ergodic solution (Xi) iG z as soon as < 0\ < 1. The 
noise densities / that we used were either Gaussian or Laplace. More precisely, 
we studied the cases 

(i) po = 0, < pi < 1, K = 0.1 and 6\ = 0, with / either Gaussian or Laplace. 
(ii) po — 0, pi = 0, k = and < 6\ < 1, with / Gaussian. 
(Hi) p\ — 0, k — and < 0\ < 1, with / Gaussian. 

Except model (i) with Gaussian / and model (ii), there is no guaranty that 
(Xj)jgz be stationary and / or ergodic for the other models. 

For the computation of least-squares estimators, the weight functions were 
X(z) = j(z) = 1, which yield the classical least-squares estimators. In each case, 
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Table 1 

Conditional least-squares estimator for the parameters of model (i) with Gaussian noises 
(middle two columns) and Laplace noises (last two columns) and sample size n = 50 



Pi 


00 




Pi 


Oo 




Pi 


0o 


-0.80 


0.10 




-0.77 


0.098 




-0.778 


0.097 


-0.50 


0.50 




-0.47 


0.493 




-0.482 


0.486 


0.20 


0.10 




0.191 


0.098 




0.197 


0.098 


0.80 


0.80 




0.763 


0.795 




0.782 


0.779 


0.90 


1.00 




0.864 


0.998 




0.886 


0.976 



Table 2 

Conditional least-squares estimator for the parameters of ARCH( 1 ) model (ii) 
(Po = Pi = 0j and Gaussian noises, for sample sizes n = 100, n = 200 and n = 400 



n= 




100 




200 




400 


Oo 


01 




Oo 


6»i 




Oo 


6»i 




Oo 


01 


0.40 


0.30 




0.433 


0.210 




0.428 


0.243 




0.418 


0.264 


0.50 


0.20 




0.525 


0.146 




0.515 


0.169 




0.510 


0.180 


0.30 


0.10 




0.304 


0.075 




0.304 


0.089 




0.300 


0.095 


0.40 


0.40 




0.472 


0.271 




0.451 


0.306 




0.442 


0.329 


0.60 


0.05 




0.610 


0.031 




0.608 


0.036 




0.603 


0.044 



our estimates were computed on the basis of 1,000 samples of length n. For model 
(i), from simple computations, it is easy to see that the least-squares estimator 
coincides with the maximum likelihood estimator for Gaussian /. The results 
concerning this model are listed in Table [T] They show that, for samples of size 
n = 50, and for either density considered, the least-squares estimators are close 
to the true value of the parameters. Rapid calculus show that these estimators 
are unbiased. The trials for this model were also done for n > 100. From the 
results that we do not present here, the estimates obtained were more accurate. 
Concerning the models (ii), the results were in general better for the maximum 
likelihood estimators than least-squares, for all the sample sizes n = 100, 200, 400 
(see Tables [2] and [3]) . Both estimators moved to the true parameter as n grew. 
For the models (Hi), only least-squares estimators were computed. This was 
done for n — 100,200,400. We observed in these cases that the estimates of 
po were good while 6o was always overestimated and Q\ was underestimated 
(see Table HI). The least-squares estimates for the models (ii) also behaved this 
way. This likely comes from the fact that the least-squares estimators for these 
models are highly biased. It seems from our simulation experiment that their 
bias converge slowly to 0, as n grows. 

Table 3 

Conditional maximum likelihood estimator for the parameters of ARCH(l) model (ii) 
(PO = Pi = Oj and Gaussian noises, for sample sizes n = 100, n = 200 and n = 400 



n= 




100 




200 




400 


Oo 


01 




0o 


0i 




0o 


0i 




00 


01 


0.40 


0.30 




0.413 


0.268 




0.407 


0.284 




0.401 


0.297 


0.50 


0.20 




0.508 


0.175 




0.505 


0.188 




0.503 


0.191 


0.30 


0.10 




0.294 


0.105 




0.300 


0.100 




0.299 


0.100 


0.40 


0.40 




0.415 


0.364 




0.406 


0.381 




0.402 


0.393 


0.60 


0.05 




0.583 


0.067 




0.593 


0.055 




0.596 


0.051 
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Table 4 

Conditional least-squares estimator for the parameters of model (Hi) with p\ = and 
Gaussian noises, for sample sizes n = 100, n = 200 and n = 400 







100 




200 


pa 


Oo 


01 




PO 


do 


01 




PO 


0o 


01 


0.20 


0.40 


0.30 




0.189 


0.447 


0.184 




0.196 


0.435 


0.219 


0.30 


0.50 


0.20 




0.292 


0.534 


0.131 




0.292 


0.524 


0.155 


0.50 


0.30 


0.10 




0.491 


0.313 


0.060 




0.494 


0.307 


0.075 


0.60 


0.40 


0.05 




0.582 


0.411 


0.025 




0.591 


0.408 


0.033 


0.40 


0.40 


0.10 




0.390 


0.415 


0.062 




0.389 


0.410 


0.077 



n= 




400 


PO 


0o 


01 




PO 


00 


0\ 


0.20 


0.40 


0.30 




0.198 


0.424 


0.253 


0.30 


0.50 


0.20 




0.295 


0.517 


0.173 


0.50 


0.30 


0.10 




0.494 


0.303 


0.085 


0.60 


0.40 


0.05 




0.594 


0.405 


0.040 


0.40 


0.40 


0.10 




0.399 


0.405 


0.086 





Fig 1. Estimation of the density of the noise in models (i), (ii) and (iii) for n 
100,200,400,600. 
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Fig 2. Estimation of the first derivative of the density of the noise in models (i), (it) and 
{Hi) for n = 100, 200, 400, 600 

For the kernel density estimation, we restricted our trials to models (i)-(m) 
with Gaussian density. The residuals were computed from least-squares fit with 
7(2) = X(z) = 1. We took tp n = ip n (see Sections [3] and [5} , a Gaussian kernel 
with h n = c n n -1 / 9 , where, denoting by a n the empirical standard deviation and 
X n i and X n 3 the first and third empirical quartiles of (Xi, . . . , X n ), 

_ 0.9mm{a n ,(X n ^-X n i )} 
1.34 

This sequence (c„) given in the software R seemed to give better results than 
c-n = u n . It is easy to check that the Gaussian kernel and the sequences (h n ) 
clearly satisfy the assumptions of Theorem 16. II We took p\ = —0.5, B\ = 1 for 



Joseph Ngatchou-Wandji/Estimation in heteroscedastic models 60 



(i), O = 0.4, 6»i =0.1 for (m) and p = 0.6, 9 = 0.4, 6 X = 0.05 for (Hi). The 
different plots of /„ and / are gathered on the same graph. The same is done 
for ffi and /W. The trials were done for n = 100,200,400,600. The estimates 
obtained for the density were good (see Figure Q]) . Those of the derivative of 
the density were not good for n = 100, 200, especially in the vicinity of the 
maxima. They were better for n — 400, 600 (see Figured]). For the density and 
its first derivative, one can see that the estimates from the models (ii) and (Hi) 
were not very close to the true functions. This is probably due to the sampling 
fluctuations or to the bias of the least-squares estimators of the parameters of 
these models. The good behavior of the estimates obtained from (i) may come 
from the fact that the conditional likelihood and the conditional least-squares 
estimators of the parameter ipo are the same in this case as we earlier pointed 
out. 
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